-
Notifications
You must be signed in to change notification settings - Fork 98
feat: short-term memory system #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds deterministic short-term memory with three storage mechanisms: - Auto-store from tool responses via memory_hint field - Explicit memory_short tool (store/get/delete/list actions) - HTTP API endpoints for external access Backend: src/caal/memory/ package with file-based JSON persistence, singleton pattern, TTL support, and context injection into LLM. Frontend: Memory Panel UI with Brain icon button, entry list, detail modal, and clear all functionality. Includes i18n translations for en, fr, it. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change default TTL from 24h to 7 days (604800s)
- Allow tools to specify custom TTL in memory_hint:
- Simple value: uses default 7d TTL
- {"value": ..., "ttl": seconds}: custom TTL
- {"value": ..., "ttl": null}: no expiry
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace linear execute→stream→retry with a loop that supports multi-step tool chaining. Model can now: call tool A → get result → call tool B → get result → generate text response. Previously, after one tool execution the code tried to stream a text response. If the model wanted to chain (call another tool), it produced 0 text chunks, triggering a retry without tools that crashed Ollama (tool references in messages but no tools registered). New flow: - Loop non-streaming chat() calls (max 5 rounds) - Each round: if tool_calls → execute → loop back - When no tool_calls → yield content or stream final response - Safety fallback: _strip_tool_messages converts tool messages to plain text if Ollama still crashes on the streaming path Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…o context - Deduplicate identical tool calls within a single round (same name + args) - Accumulate tool names/params across chained rounds for frontend indicator - Keep tool indicator showing after response (don't clear when tools were used) - Include tool call arguments in ToolDataCache context injection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Memory file was failing with permission denied because /app is owned by root. Now uses CAAL_MEMORY_DIR=/app/data (the caal-memory volume) and entrypoint ensures directory is writable by agent user. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents the LLM from using memory data in the initial greeting. Memory context is now skipped when there are no user messages yet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…haining Context injection helps the LLM know what's in memory so it can chain tools correctly (e.g. memory_short → flight_tracker). Without it, the model may skip memory and go to other tools directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…missions - Memory detail modal now has pencil icon to edit values in-place - Add registry_cache.json symlink to entrypoint.sh (same pattern as settings.json) to fix permission denied on /app/registry_cache.json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ministral-3 recommended instruction temperature is 0.15. The old 0.7 default was overriding the Modelfile setting on every API call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
You have probably thought about this now thoroughly that I have but...
IMO memory is likely to be the difference between Siri and openclaw i.e. limited Vs limitless. In other words, it's complicated. But there has to be a lot of research on this, so probably not necessary to reinvent the wheel. |
|
@Sophist-UK, Great comment and you're touching on something I’ve been thinking about a lot. You're right that memory has layers. Short-term is step one - this PR covers transient data like flight numbers, package tracking, things that are useful for a few days and then expire. TTL-based, simple, predictable. Long-term memory is planned as well. Thinking graph-based (something like Graphiti) with embeddings for contextual retrieval. This is where "Corey prefers morning flights" or relationships between contacts and preferences would live. Your points about forgetting and fuzzy contextual search are spot on for that layer - metadata-driven expiry and hybrid search (semantic + keyword) are likely where that heads. Trick is to retrieve that information when necessary and inject it. Where CAAL's approach differs from what you might be picturing is the role memory plays. In CAAL's architecture, the LLM is a router - it decides which tool to call and with what parameters, and then deterministic n8n workflows execute. Memory serves that routing. When you say "is my flight on time?" the model needs to know which flight so it can call the right tool with the right parameters. It's not accumulating capability or learning new skills - memory serves as data to get enough context to make better routing decisions. Skills in CAAL are n8n workflows. They go through review (automated + human) before they're live. The model can build new workflows (we showed this in a previous video) but it has to be prompted to do so, and the method is through calling another workflow that uses a larger LLM to generate the workflow. That boundary is intentional - it's what lets an 8B model be reliable and secure. The model doesn't need to be smart enough to self-improve, it needs to be smart enough to route. So to your five points: 1 and 2 - yes, layered memory with graph + embeddings is on the roadmap. 3 - agreed, and scoping what the model can do with memory (route, not execute) helps bound that risk. 4 - absolutely, TTL is built into this PR and long-term will need smarter expiry. 5 - contextual retrieval is key for the long-term layer. Appreciate the thoughtful input. This is exactly the kind of discussion that helps shape the architecture. Any experience with Graphiti or similar? cmac |
Summary
memory_hint, explicitmemory_shorttool (store/get/delete/list), and HTTP APIcaal-memoryvolume with 7-day default TTLmemory_short(get) → flight_tracker/app/registry_cache.jsonand memory persistenceArchitecture
Three storage paths:
memory_hintin response → auto-storedmemory_short(store)POST /memoryfor external systemsContext injection serves as awareness layer — the LLM sees what's in memory so it knows to chain tools (e.g. pull email from memory → send via Gmail), but retrieval still goes through the tool for verification.
Test plan
🤖 Generated with Claude Code